| Participants | Patients within 24 hours of admission with acute coronary syndrome; average age approx. 64; multiple cardiovascular risk factors; patients with risk of bleeding excluded |
| Intervention | Clopidogrel in addition to aspirin and other aspects of standard care |
| Comparator | Aspirin and other aspects of standard care |
| Outcome | Primary endpoint: cardiovascular death, nonfatal MI or stroke; important safety endpoint: major bleed |
The first primary outcome …occurred in 9.3 percent of the patients in the clopidogrel group and 11.4 percent of the patients in the placebo group (relative risk with clopidogrel as compared with placebo, 0.80; 95 percent confidence interval, 0.72 to 0.90; \(p < 0.001\)).
What do the statistical statements mean (\(p\) values, confidence intervals)?
## [1] 0 1 0 1 1 0 1 1 1 0
In this series of tosses, there were 6 \(H\) and 4 \(T\).
If we conduct the 10-flip experiment second time, we get:
## [1] 0 1 1 0 0 0 0 1 0 0
This time we see 3 \(H\) and 7 \(T\).
## [1] 5 8 5 5 7 5 7 4 7 4
The mean number of \(H\) in this series was 5.7.
Imagine you have coin of unknown bias (i.e. the probability of \(H\) is unknown—it is unknown whether the coin is fair, favours \(T\) or favours \(H\)). What test could you conduct to assess whether the coin is fair?
Attempt to describe the hypothesis you are testing and the statistical model you are using for the test.
We want to identify the coin as biased if \(P(H) \le 0.45\) or \(\ge 0.55\).
| Number of tosses | Expected outcome if -5% bias | Expected outcome if +5% bias | Probability of observing a number of heads outside of the range assuming coin is fair |
|---|---|---|---|
| 10 | 4 | 6 | 0.5488281 |
| 50 | 22 | 28 | 0.4010620 |
| 100 | 45 | 55 | 0.3197273 |
| 200 | 90 | 110 | 0.1581653 |
| 500 | 225 | 275 | 0.0253962 |
| 1000 | 450 | 550 | 0.0015611 |
the statistical hypothesis that there is no (important) difference between experimental treatment and control in relation to the primary endpoint
is the distribution of expected results expected assuming a particular hypothesis about the effect size is true (e.g. the null hypothesis), all the assumptions associated with the statistical model are true, and the trial is conducted as planned.
is the pre-test probability of rejecting the null hypothesis when the null hypothesis is true. It is usually set at 0.05.
\(\beta\) is the pre-test probability of accepting the null hypothesis when the alternative hypothesis is true.
is the pre-study probability that the study will produce a statistically significant result for a given sample size and postulated effect size
a measure of the compatibility of the observed data with the data that would be expected if the null hypothesis was true when all other statistical and methodological assumptions are met
is an estimate of the range of effect sizes that is considered compatible with the observed data assuming the statistical and methodological assumptions of the study are met.
if the study was repeated many times, and the same procedure was used to calculate the 95% confidence interval, in the long run, you would expect the calculated 95% confidence intervals would include the true value of the parameter 95% of the time
a 95% confidence interval provides the range of values that are not statistically different from the observed point estimate at the 0.05 level.
| Statistically significant result | Non-statistically significant result | |
|---|---|---|
| Adequately powered test | Reject the null. Accept the alternative hypothesis | The test failed to reject the null. Either the null is true or the effect size is smaller than was tested |
| Underpowered test | Provisionally accept the alternative hypothesis | Underdetermined result. The test is unable to detect effect sizes that might be important. |
Precise, but confusing
If the study was repeated many times, and the same procedure was used to calculate the 95% confidence interval, in the long run, you would expect the calculated 95% confidence intervals would include the true value of the parameter 95% of the time
A 95% confidence interval provides the range of values that are not statistically different from the observed point estimate at the 0.05 level
Less precise, but useful
The confidence interval provides a range of plausible values for the unknown parameter
The lower limit is a likely lower bound estimate of the parameter; the upper limit a likely upper bound
Incorrect and misleading
You can be 95% confident that the true value lies between the observed confidence interval
The 95% confidence interval has a 95% chance of including the true effect size
Confidence intervals provide a range of plausible values for the parameter. If the truth is closer to the worse-case scenario, would that be a problem?
If the confidence interval doesn’t include the value of the parameter assumed in the null hypothesis, the finding will also be statistically significant
How should we interpret these results?
The null hypothesis is that there is no difference in outcome between participants who received clopidogrel plus aspirin compared to the participants who received aspirin alone.
The null hypothesis provides a statistical model that can be tested by the study. An important part of the statistical model for the null hypothesis is that the true relative risk for the primary endpoint is 1 (i.e. no difference in primary endpoint event rates in participants that received clopidogrel and aspirin compared to aspirin alone)
The \(p\) is below our arbitrary cut-off for “rejecting the null hypothesis”. The observed data (relative risk 0.80) would be unexpected if the null hypothesis and associated statistical model was true
Inference: clopidogrel in addition to aspirin reduces the risk of event(s) within the primary endpoint (in the participants in the trial, treated under the conditions of the trial)
The 95% confidence interval for the CURE primary endpoint is RR 0.72–0.90
Strictly: the confidence interval for the observed data is 0.72–0.90. If we repeated CURE many times and each time calculated a confidence interval in the same way, we would expect 95% of these confidence intervals to include the true effect size in the long run
Less strictly: a plausible range of values we could expect for the benefits of clopidogrel in addition to aspirin in the kinds of patients included in CURE is a RR of 0.72–0.80
The most-often used cut-off for \(p\) values used in clinical research is 0.05.
If the \(p\) value is \(< 0.05\), the result will be considered statistically significant
A statistically significant result for the primary endpoint of a trial is more trustworthy than statistically significant results on secondary endpoints or subgroups—the trial was set up the test the primary endpoint.
Once you have determined that the primary endpoint of trial was statistically significant, the next question is to determine whether the magnitude of the effect is clinically significant
The confidence interval can help you to make that judgment
The CURE Investigators. 2001. “Effects of Clopidogrel in Addition to Aspirin in Patients with Acute Coronary Syndromes Without ST-Segment Elevation.” New England Journal of Medicine 345 (7): 494–502.